[00:01.370 --> 00:05.750]  So, welcome to my talk on Kubernetes called Attacking the Helmsman.
[00:05.910 --> 00:09.890]  So, as Joan just said, I'm Mohit. I'm a security consultant at F-Secure.
[00:10.430 --> 00:13.410]  I'm primarily a pen tester, but I also help our clients with, you know,
[00:13.410 --> 00:17.970]  architectural diagrams and stuff from a security point of view.
[00:18.330 --> 00:20.470]  So, a little bit about what we're going to be talking about today.
[00:20.470 --> 00:23.470]  So, I'll start on a bit of background of Kubernetes,
[00:24.010 --> 00:26.830]  and then moving to the management plane, you know, the different components
[00:26.830 --> 00:29.530]  within there, how that gets, you know, affected by it being
[00:29.530 --> 00:34.790]  within the cloud, and then a bunch of different security features
[00:34.790 --> 00:36.930]  within Kubernetes and resources.
[00:38.450 --> 00:40.550]  So, a little bit about Kubernetes.
[00:40.610 --> 00:43.730]  So, Kubernetes, at least I've been told, is Greek for helmsman.
[00:44.410 --> 00:47.930]  When I heard about this, I asked one of my Greek colleagues about this,
[00:47.930 --> 00:50.870]  and he wasn't so sure about that. At least it didn't click in his head.
[00:51.410 --> 00:55.610]  Later on, he came across Istio, which also happens to follow
[00:55.610 --> 00:58.390]  that same sort of Greek, you know, naming convention.
[00:58.390 --> 01:01.110]  And at that point, something clicked, and he was like, no, okay,
[01:01.110 --> 01:03.810]  so it may be a thing, however, it's spelled a little bit differently,
[01:03.810 --> 01:05.430]  and he pronounced it Kee-bee-nee-tees.
[01:05.910 --> 01:07.970]  I hope I haven't butchered that.
[01:08.550 --> 01:11.190]  For those of you who don't know, Kubernetes is a container
[01:11.650 --> 01:15.350]  orchestration tool. It essentially helps you manage a large number
[01:15.350 --> 01:17.890]  of containers over potentially a large number of hosts,
[01:18.730 --> 01:21.830]  and it provides a lot of features to aid in that kind of management,
[01:21.830 --> 01:26.010]  and you providing a desired state, and Kubernetes tries to maintain it.
[01:26.010 --> 01:29.470]  And recently, it's also started moving into multiple clouds.
[01:29.570 --> 01:32.470]  I remember, I think, a couple of coupons ago,
[01:32.470 --> 01:34.390]  it was talked quite heavily there.
[01:35.370 --> 01:39.250]  And Kubernetes is, I think, one of the more popular
[01:39.250 --> 01:42.170]  container orchestration tools I've seen.
[01:42.170 --> 01:44.610]  At least I haven't seen something that's more popular.
[01:44.830 --> 01:45.750]  So, yeah.
[01:46.350 --> 01:48.490]  So a little bit about the management plane.
[01:48.490 --> 01:50.630]  Now, the top box here is the management plane,
[01:50.630 --> 01:52.850]  and it is essentially what helps manage the cluster
[01:52.850 --> 01:55.050]  from an automation perspective.
[01:55.050 --> 01:58.090]  So at the beginning, you have the API server.
[01:58.090 --> 02:02.070]  The API server is effectively a stateless REST-based API
[02:02.590 --> 02:06.490]  that is used for users to interact with,
[02:06.490 --> 02:10.130]  and it's kind of like the front end of the management plane,
[02:10.130 --> 02:12.490]  which stores its data within ETCD.
[02:12.990 --> 02:15.290]  You also have the controller manager and the scheduler.
[02:15.290 --> 02:19.630]  The scheduler is the resource that kind of wants new pods,
[02:19.630 --> 02:22.450]  which are the smallest item within Kubernetes.
[02:22.530 --> 02:25.030]  It's created, and it doesn't have a node assigned.
[02:25.030 --> 02:26.990]  It'll assign it to a node to deploy to,
[02:26.990 --> 02:29.390]  and the controller manager is kind of like the brains
[02:29.390 --> 02:32.170]  in the background, just as things are happening,
[02:32.170 --> 02:34.730]  monitors their states and deploys things or updates things
[02:34.730 --> 02:38.590]  as needed to make sure it's at where we,
[02:38.590 --> 02:41.490]  or where the users desire the state of the cluster to be.
[02:41.770 --> 02:44.090]  And then outside the management plane,
[02:44.090 --> 02:45.390]  you have the various nodes.
[02:45.390 --> 02:47.670]  These are the ones that actually do most of the work,
[02:47.670 --> 02:50.970]  running all the containers for the various workloads.
[02:51.310 --> 02:53.210]  Typically, each node contains a kubelet,
[02:53.210 --> 02:56.230]  which kind of manages the pods on that node.
[02:56.250 --> 02:58.630]  There's usually some sort of networking with kubeproxy,
[02:58.630 --> 03:02.710]  which does like network-based routing and configuration.
[03:03.070 --> 03:04.830]  And of course, you're going to have your container runtime
[03:04.830 --> 03:06.430]  and your base OS.
[03:08.910 --> 03:11.170]  So digging a little bit more into the API server.
[03:11.170 --> 03:14.110]  So as I said earlier, this is kind of like the front end
[03:14.110 --> 03:17.690]  of the cluster for people to communicate with.
[03:17.690 --> 03:22.050]  And it performs some additional features on top of just storing that data.
[03:22.050 --> 03:25.130]  It says, you know, applying any RBAC policies that may apply.
[03:25.130 --> 03:27.350]  So, you know, if a user wishes to do something,
[03:27.350 --> 03:29.990]  it would first validate that they are allowed to actually do that.
[03:30.230 --> 03:33.330]  And also does additional data validation to make sure what it's been given
[03:33.330 --> 03:34.990]  does make sense.
[03:34.990 --> 03:38.310]  So it's not being asked to deploy, you know, a pod with incorrect specifications
[03:38.310 --> 03:39.110]  or anything.
[03:40.370 --> 03:42.230]  And once it has this data and is happy with it,
[03:42.230 --> 03:45.570]  it gets stored in the ECD, which is effectively the golden grail
[03:45.570 --> 03:48.290]  of information within the cluster, because it stores pretty much everything.
[03:48.570 --> 03:51.610]  All the resources that are deployed in the cluster
[03:51.610 --> 03:55.930]  are defined in YAML, which are stored in ETCD.
[03:56.110 --> 03:58.390]  And therefore, if you have write access or read access to this,
[03:58.390 --> 04:01.730]  you have pretty elevated access to the data within the cluster.
[04:01.770 --> 04:05.070]  With write access, you know, you could write yourself new roles
[04:05.070 --> 04:08.150]  or policies that would let you have cluster admin access.
[04:08.150 --> 04:11.810]  And if you have read access, you can just read the secrets for like...
[04:11.810 --> 04:14.790]  Sorry, not read the secrets, just read various data.
[04:14.850 --> 04:18.430]  Secrets are a little bit more interesting as they tend to be encrypted in ETCD.
[04:20.050 --> 04:24.390]  And the way ETCD is built, there's not really an application-level authentication.
[04:24.390 --> 04:32.420]  So you kind of focus more on whitelisting the source addresses
[04:32.420 --> 04:34.160]  that are permitted to talk to it.
[04:34.160 --> 04:37.360]  Typically, it's recommended to only whitelist this to the Kube API server,
[04:37.360 --> 04:40.500]  as that's really the thing that needs to talk to it.
[04:42.360 --> 04:47.840]  As I said earlier, the Kubelet is the component that runs on each node.
[04:47.840 --> 04:52.080]  And this tends to register itself with the API server to say, you know,
[04:52.080 --> 04:58.680]  I'm a node here, please give me work and have permissions within the cluster
[04:58.680 --> 05:01.440]  so I can access the data I need.
[05:01.440 --> 05:03.600]  So for the port specifications, it needs to deploy pods
[05:03.600 --> 05:06.780]  and any other data that may need to be deployed alongside it,
[05:06.780 --> 05:09.000]  for example, the secrets for those pods.
[05:09.000 --> 05:12.840]  And to have this access, it has a server's account to the API server.
[05:12.840 --> 05:19.120]  And in itself, it exposes an API, typically on port 10.2.50.
[05:19.220 --> 05:21.900]  And there have been attacks against this in the past as well.
[05:21.900 --> 05:26.900]  So one of my colleagues, Alexander Keskis, wrote a blog post on this a while back,
[05:26.900 --> 05:29.800]  where he was leveraging anonymous authentication on the Kubelet API
[05:29.800 --> 05:33.620]  then to talk to the API server and get into various pods.
[05:35.700 --> 05:42.100]  Now, the management plane within a cloud-managed Kubernetes cluster such as EKS or GKE
[05:42.840 --> 05:46.920]  the management plane is simply managed by the cloud provider.
[05:46.980 --> 05:50.980]  As such, when we're doing pentest engagements, these are typically considered out of scope.
[05:52.140 --> 05:55.820]  And we kind of trust the cloud provider to manage that appropriately.
[05:55.860 --> 05:59.720]  However, being in the cloud also adds cloud-specific interactions.
[06:00.060 --> 06:05.340]  For example, instance metadata raises its ugly head again, where the VMs,
[06:05.340 --> 06:11.120]  if deployed on VMs within the cloud, it would have access to the instance metadata
[06:11.120 --> 06:18.080]  and theoretically pods could communicate with that to get AWS credentials
[06:18.080 --> 06:22.940]  to talk to the rest of the environment if the nodes have it or anything else.
[06:23.180 --> 06:27.840]  Now, there are various protections for this that are possible, but they're not always in play.
[06:28.580 --> 06:30.640]  And the fact that it is also in the cloud environment,
[06:30.640 --> 06:33.840]  typically IAM is also integrated with access to the cluster.
[06:34.080 --> 06:37.640]  So when authenticating to a Kubernetes cluster, which is managed,
[06:37.640 --> 06:42.200]  typically you're using your cloud provider's IAM or CLI to kind of get access
[06:42.200 --> 06:45.180]  and generate your kube-config files so you can authenticate.
[06:48.970 --> 06:53.770]  So a while back, we were doing an assessment where a couple of our guys noticed
[06:53.770 --> 06:57.250]  that they could talk to the instance metadata within a cluster within GKE.
[06:57.630 --> 07:03.730]  And within the instance metadata, they found tokens for talking to the kube-api server.
[07:04.030 --> 07:09.350]  They enumerated the permissions for this and noticed that they were allowed to create CSRs.
[07:09.350 --> 07:14.030]  And going back to the fact that kubelet self-registered, these CSRs did that.
[07:14.170 --> 07:17.750]  You could essentially create a CSR to send it to the API server,
[07:17.750 --> 07:23.110]  which when it gets signed is your authentication token as a kubelet to the API server.
[07:23.110 --> 07:27.530]  So they could use this to start enumerating the different nodes and get the pods,
[07:27.530 --> 07:31.630]  the secrets within and get to cluster admin in that manner.
[07:31.770 --> 07:35.110]  So effectively from a pod that they can talk to the instance metadata,
[07:35.110 --> 07:38.930]  they escalated themselves quite quickly to cluster admin.
[07:39.350 --> 07:45.290]  Now back then they just did this manually, but recently I got told of a script that kind of did it.
[07:45.770 --> 07:49.910]  I can't remember the person's name. Just a second.
[07:52.250 --> 07:57.490]  A script by Brad Giesemann that's a quick bash script that kind of does this for you
[07:57.490 --> 08:03.250]  and enumerates that data and, you know, gets you the secrets and stuff within the cluster.
[08:04.770 --> 08:10.050]  Now Kubernetes is very much API based. So the API server is pretty much a whole bunch of APIs.
[08:10.110 --> 08:13.170]  And by design, Kubernetes is modular.
[08:13.230 --> 08:17.010]  And we're going to see this throughout the presentation where the way that it's set up
[08:17.010 --> 08:21.710]  allows for the easy adding of new resources or configurations
[08:21.710 --> 08:26.510]  so it can work better in people's individuals, you know, configurations.
[08:27.670 --> 08:33.030]  So the ability to add additional APIs is present and it's also useful for identifying technology sites
[08:33.030 --> 08:35.870]  when treating it in an offensive perspective.
[08:36.110 --> 08:40.250]  As when new technologies, you know, tend to use creating custom APIs,
[08:40.250 --> 08:44.230]  you can enumerate these APIs and figure out what kind of technologies are in use.
[08:44.450 --> 08:50.970]  So this is the example output of this for a pretty bog standard cluster with some additional things.
[08:50.970 --> 08:54.970]  And from this we can quickly enumerate that, you know, it's using Calico for networking,
[08:54.970 --> 08:57.370]  it's using LinkedIn as a service mesh and so on.
[08:57.370 --> 09:00.170]  So it's a quite useful way to enumerate this information.
[09:03.180 --> 09:07.060]  So Kubernetes uses RBAC for permission control.
[09:07.380 --> 09:11.540]  It's generally found to use RBAC. There are other ways it can be done, such as ABAC.
[09:11.800 --> 09:14.860]  But most of the time it's RBAC.
[09:15.000 --> 09:19.040]  And within RBAC there are three main high-level concepts within Kubernetes.
[09:19.180 --> 09:21.160]  So the first one is the targets.
[09:21.280 --> 09:23.080]  So these are what the permissions apply to.
[09:23.080 --> 09:25.360]  So your users, your groups, your service accounts.
[09:25.580 --> 09:27.360]  And then these get assigned roles.
[09:27.360 --> 09:32.480]  And the roles grant you those permissions in, you know, the various areas of the cluster.
[09:32.800 --> 09:36.940]  This can be limited by the API groups, the resource, the verb.
[09:36.940 --> 09:40.320]  So, you know, whether you can create it or just get it or whatever.
[09:40.320 --> 09:42.520]  And potentially also the resource name.
[09:42.820 --> 09:49.320]  And now these are bound to those targets using role bindings, which basically are map roles onto your targets.
[09:50.300 --> 09:56.860]  Now these RBAC permissions can either apply cluster-wide or within a single namespace.
[09:56.860 --> 10:02.920]  And the way that kind of gets differentiated is with, you know, the role slash cluster role or role binding slash cluster role binding.
[10:02.920 --> 10:07.800]  So if it were a cluster role and a cluster role binding, the permissions applied would be cluster-wide.
[10:07.840 --> 10:12.940]  And all resources that, you know, are allowed by that role would be allowed to that target.
[10:12.940 --> 10:17.980]  However, if it's, you know, using the role binding, then it's only within a specific namespace.
[10:17.980 --> 10:19.660]  Those permissions are granted.
[10:21.020 --> 10:27.960]  Now, of those three, you know, kind of targets I kind of mentioned, the only one really managed by Kubernetes is the service account.
[10:28.040 --> 10:30.600]  And these are authenticated using Bero tokens.
[10:30.600 --> 10:38.480]  And they are, you know, fully managed within Kubernetes, typically, with service accounts being created as a resource.
[10:38.480 --> 10:41.900]  The authentication credentials are mounted to pods.
[10:41.900 --> 10:45.980]  So they have access to those service accounts, which they can use to talk to the cluster.
[10:45.980 --> 10:48.820]  Users and groups are not managed by Kubernetes.
[10:48.820 --> 10:55.040]  And it's typically left to some sort of external service with authentication modules within Kubernetes,
[10:55.040 --> 11:00.860]  providing that kind of access to those authentication mechanisms so it can kind of, you know, authenticate users.
[11:01.240 --> 11:06.580]  And it kind of trusts those third-party systems to be correct and just follows it.
[11:06.580 --> 11:13.160]  So, for example, using X.509 certificates, the common name provides the user and the organization, you know,
[11:13.160 --> 11:16.980]  within a certificate, tells Kubernetes what groups it's in.
[11:16.980 --> 11:22.420]  And it just validates that it's been signed by the appropriate certificate authority and says, yep, that's cool with me.
[11:22.420 --> 11:33.300]  Or, for example, if you're running a Windows-based environment, it can also use the Active Directory as a kind of source of truth for what users are allowed or not.
[11:35.420 --> 11:43.460]  So service accounts being, you know, in Kubernetes, when you create a new namespace, which is the way people tend to do segregation,
[11:43.540 --> 11:47.240]  a new default service account is created within that namespace.
[11:47.240 --> 11:52.900]  And this default service account is automatically mounted into every pod within that namespace by default.
[11:53.500 --> 11:57.520]  Now, this service account does not have any permissions directly applied to it.
[11:57.520 --> 12:01.760]  However, if it's in any groups, then it would have those permissions as well.
[12:01.760 --> 12:05.960]  But then this is typically something you should avoid.
[12:06.080 --> 12:12.360]  Because if you do assign any permissions to this default service account for one pod, and another pod is also using it,
[12:12.360 --> 12:17.100]  then that other pod would also have those permissions, you know, kind of breaking the principles of least privilege.
[12:18.200 --> 12:26.180]  So it's typically recommended to disable the auto-mounting of these default service accounts and creating bespoke service accounts for each time you'd need it.
[12:27.700 --> 12:35.120]  Now, the groups that, you know, the default service account may be in, there are three groups it's in by default.
[12:35.440 --> 12:40.080]  So that's system authenticated, which is for pretty much every authenticated entity within the cluster,
[12:40.080 --> 12:43.100]  system service accounts, which is the group for all service accounts,
[12:43.100 --> 12:49.200]  and then system service accounts and the name of the namespace is for all service accounts within that particular namespace.
[12:49.260 --> 12:54.360]  So if any of those three groups have any permissions assigned to them, that default service account will also have it.
[12:54.360 --> 12:59.500]  There's also a system unauthenticated group. So if, you know, the cluster allows anonymous authentication,
[13:00.140 --> 13:06.680]  the user, well, not really a user, but the person who has communicating with the cluster will have the same permissions,
[13:06.680 --> 13:09.520]  as would be granted by system unauthenticated.
[13:10.700 --> 13:13.780]  So there was a time not too long ago where we were on assessment,
[13:13.780 --> 13:20.500]  and the groups for system service accounts are actually over-provisioned, granting cluster admin access.
[13:20.500 --> 13:24.740]  And therefore, you know, as soon as we compromise a single pod in the cluster,
[13:25.240 --> 13:29.440]  we were able to use that service account, which was mounted into every pod by default.
[13:29.780 --> 13:33.880]  And we very quickly got cluster admin into the cluster in that manner.
[13:36.380 --> 13:41.620]  Now, when enumerating permissions, there are a bunch of tools I tend to use when enumerating permissions.
[13:41.660 --> 13:46.360]  I think the one I use the most is a tool called Rakesh, get hub link on the slides.
[13:46.360 --> 13:52.240]  And essentially, this takes, you know, your token that you used for authentication or whatever,
[13:52.240 --> 13:56.460]  and it kind of enumerates all the permissions that are available and then gives you a nice tick or a cross
[13:57.040 --> 14:02.180]  in a tabular format showing, you know, what you are and aren't allowed to do,
[14:02.180 --> 14:06.820]  which, you know, if you've just compromised a single JWT token to authenticate for a service account,
[14:06.820 --> 14:09.320]  it's great to kind of figure out what you can and cannot do.
[14:10.540 --> 14:15.740]  But then, you know, when doing more of a review of RBAC, there's a tool called RBAC Lookup,
[14:15.740 --> 14:20.980]  which kind of lists through all the subjects, all the targets, the roles and role bindings
[14:20.980 --> 14:25.360]  that are present in the cluster. So, you know, if we've identified a role that is, you know,
[14:25.360 --> 14:29.900]  considered risky in a way, so for example, it grants you excessive permissions,
[14:29.900 --> 14:35.700]  we can quickly identify what targets have that role assigned to them.
[14:36.920 --> 14:44.920]  The final tool I'd like to quickly mention is QBScan, which kind of takes the approach a bit more automated,
[14:44.920 --> 14:50.620]  where it automatically tries to find those risky accounts, whether these are just our cluster admin by default
[14:50.620 --> 14:58.140]  or they have permissions that could easily help them get to a cluster admin level of access.
[14:58.140 --> 15:04.180]  So moving on to networking, there are a variety of things you need to consider with networking.
[15:04.180 --> 15:11.880]  Now, Kubernetes, as I said earlier, being very modular, the actual networking interface you use is actually kind of up to you.
[15:11.880 --> 15:17.080]  Kubernetes say that, you know, it should follow the CNI container networking interface kind of specification.
[15:17.220 --> 15:23.520]  But apart from that, we're Gucci. So within that, you know, there are a bunch of things we need to kind of look into,
[15:23.520 --> 15:28.080]  like pod-to-service, ingress, pod-to-pod, and the CNI itself.
[15:28.200 --> 15:33.000]  Now, because CNI is pretty much freedom, there are a lot of CNIs out there,
[15:33.000 --> 15:38.480]  and each of them provide different functionality and features.
[15:39.220 --> 15:46.640]  So there are, you know, the cloud network-based CNIs, which kind of integrate with your cloud providers.
[15:46.640 --> 15:51.480]  Networking interfaces like AWS VPC would, you know, kind of put you within the VPCs
[15:51.480 --> 15:55.100]  as opposed to a weird overlay network within the cluster itself.
[15:55.400 --> 16:00.820]  And then there are others that, you know, kind of provide in-transit encryption or other features.
[16:01.380 --> 16:03.020]  And there's a whole bunch.
[16:04.580 --> 16:09.240]  For traffic to get into the cluster, typically, you would find an ingress controller.
[16:09.240 --> 16:14.320]  Some alternative methods are like a node port, which is just, you know, a high-numbered port,
[16:14.320 --> 16:17.540]  usually above 30,000, where a service is exposed.
[16:17.640 --> 16:21.700]  But more often than not, you're going to find some sort of ingress controller,
[16:21.700 --> 16:25.920]  where it's typically a pod, such as, you know, Trafic or Nginx or something,
[16:25.920 --> 16:31.640]  that has permissions within the cluster to list the services, which are the way you expose pods in a cluster,
[16:31.640 --> 16:36.400]  and then gets traffic coming in and then tends to route them around internally in the cluster
[16:36.400 --> 16:41.060]  to the correct location or the correct service.
[16:43.160 --> 16:45.880]  DNS for resolving within the cluster.
[16:46.480 --> 16:51.580]  Now, if it's just a pod, it's automatically got a DNS name with the pod's IP address,
[16:51.580 --> 16:55.420]  the name of the namespace, .pod.cluster.local.
[16:55.460 --> 17:00.780]  However, it's usually not common for people to use the pod's DNS name for resolution.
[17:00.780 --> 17:05.080]  It's more common not to use a headless service on top of that,
[17:05.080 --> 17:08.200]  where after you've created a pod with the labels appropriate to it,
[17:08.200 --> 17:13.100]  you create a headless service pointing to those pods, and it kind of simplifies that entire process,
[17:13.100 --> 17:16.780]  as it can also do a load balancing between the various pods.
[17:17.580 --> 17:22.180]  And typically, this is called, you know, the service name.namespace.service.cluster.local.
[17:22.180 --> 17:29.340]  And because the search space for the DNS queries include namespace.service.cluster.local and service.cluster.local,
[17:29.340 --> 17:32.820]  you can just do a query for service if you're in the same namespace,
[17:32.820 --> 17:35.600]  or if you're in a separate namespace, service.namespace.
[17:36.140 --> 17:40.520]  There's an additional DNS name that's usually also added, and that's hostname.subdomain.
[17:40.520 --> 17:44.000]  And the hostname and subdomain is kind of extracted from the pod.
[17:44.000 --> 17:51.160]  So the pod can specify the hostname and subdomain at which point it will be available using that kind of DNS name,
[17:51.160 --> 17:54.240]  which kind of brings the idea of spoofing into play.
[17:54.260 --> 17:58.540]  So, for example, if I were to set the subdomain to, say, com and the hostname to google,
[17:58.540 --> 18:08.500]  suddenly I have a DNS queryable name, google.com, which would complete and resolve to an internal IP address.
[18:08.960 --> 18:14.260]  So if you could, you know, somehow get yourself a google.com certificate, which I doubt you're going to get an external CA to do it,
[18:14.260 --> 18:19.100]  but if you're using some sort of internal CA, it could let you do some interesting attacks in that manner,
[18:19.100 --> 18:21.320]  or manhandling other connections.
[18:23.260 --> 18:28.680]  Now, the Intel network is typically flat by default, because Kubernetes has this thing of, you know,
[18:28.680 --> 18:33.340]  things should be able to talk to each other to facilitate, you know, microservices-based architecture,
[18:34.040 --> 18:38.340]  which is great from, you know, a developer's point of view, where they can deploy their, you know,
[18:38.340 --> 18:43.020]  series of pods and have them communicate in a way they need to, and it works.
[18:43.060 --> 18:46.020]  However, from a security point of view, it adds a lot of issues,
[18:46.020 --> 18:49.960]  because the pods in the cluster aren't only going to be the application pods,
[18:49.960 --> 18:52.540]  they're also going to be, you know, a variety of management pods, etc.,
[18:52.540 --> 18:56.080]  which if you can talk to, you can potentially do some dangerous things.
[18:57.000 --> 19:01.680]  So, to kind of support locking down what kind of access pods have in the network,
[19:01.680 --> 19:02.900]  there are network policies.
[19:03.540 --> 19:06.400]  Now, network policies, effectively, are kind of like firewall rules,
[19:06.400 --> 19:11.000]  where you can say, you know, these pods are allowed to talk to these pods or these CIDR ranges and nothing else,
[19:14.440 --> 19:19.940]  and they're only allowed to, you know, get incoming traffic from these locations that are,
[19:19.940 --> 19:22.240]  you know, permitted by the administrators.
[19:23.460 --> 19:24.860]  So, it's kind of great.
[19:24.880 --> 19:29.440]  Now, by default, there is no, like, deny rule or anything within network policies.
[19:29.440 --> 19:33.920]  So, all the network policies are kind of like allow this connection.
[19:34.240 --> 19:38.180]  So, the way it tends to work is if there are no network policies within the namespace,
[19:38.620 --> 19:40.060]  everything's a flat network by default.
[19:40.060 --> 19:44.120]  But as soon as you start adding the first network policy, it starts becoming,
[19:44.120 --> 19:48.180]  you need to have a network policy allow that connection before it's permitted.
[19:48.180 --> 19:51.440]  So, your default deny, instead of being a rule that says deny all connections,
[19:51.440 --> 19:54.400]  is a network policy with zero rules in it.
[19:54.440 --> 19:57.080]  And then you can kind of add the rules in as you need it.
[19:57.880 --> 20:04.280]  Now, the Intel network, depending on the CNI you use, but most of them don't allow encryption in transit,
[20:04.280 --> 20:05.640]  or they don't support it.
[20:06.000 --> 20:08.800]  So, there are this concept of service meshes.
[20:09.480 --> 20:13.480]  Now, service meshes, they typically provide encryption in transit,
[20:13.480 --> 20:17.640]  but a whole bunch of other stuff like monitoring and other metrics that, you know,
[20:17.640 --> 20:21.400]  your administrators can use to kind of help debug issues in the cluster
[20:21.400 --> 20:24.440]  or just have an oversight of what's actually going on.
[20:24.620 --> 20:29.160]  And the way this is typically done is through a sidecar container of a pod.
[20:29.160 --> 20:31.540]  So, a pod is one or more containers.
[20:31.580 --> 20:34.220]  So, this just would be another container in that pod.
[20:35.340 --> 20:39.300]  And typically, these are automatically injected into the pod at runtime.
[20:39.300 --> 20:46.020]  So, when that pod gets created, admission controllers are kind of used to inject those pods.
[20:46.540 --> 20:51.600]  Now, admission controllers are functionalities within the API server that basically allow,
[20:51.600 --> 20:57.880]  you know, modular-based additional features or checks to be added to that validation process
[20:57.880 --> 20:59.500]  or request as they come in.
[20:59.540 --> 21:04.800]  So, they can, you know, be used to write custom policies, for example, using OPA,
[21:04.800 --> 21:11.040]  which is a policy as code tool, to provide custom checks that the API server can then use
[21:11.040 --> 21:13.320]  to validate the data that's coming in.
[21:13.980 --> 21:16.140]  Now, there are two types of admission controls.
[21:16.140 --> 21:18.080]  There's mutating and validating.
[21:18.440 --> 21:22.600]  Validating just validates the data and says, you know, this looks all right.
[21:22.600 --> 21:23.960]  Yeah, I'm happy with this.
[21:24.200 --> 21:27.580]  Mutating takes it a little bit further, where if it's not happy,
[21:27.580 --> 21:31.100]  it has the option to change the request as it comes in.
[21:31.100 --> 21:33.500]  So, for example, you have a pod coming in.
[21:33.500 --> 21:35.760]  It feels like changing something.
[21:35.760 --> 21:38.260]  It's well within its rights to do so.
[21:39.360 --> 21:45.220]  So, the overall flow looks something like this, where when a request comes into the API server,
[21:45.220 --> 21:49.960]  first, the API server would, you know, make sure you're allowed to do what you're trying to do in the first place,
[21:49.960 --> 21:54.700]  to roll back and make sure, you know, you are a person or service account.
[21:54.920 --> 21:59.360]  It would then pass it to the mutating admission controllers to see whether there's any changes they would like to make
[21:59.360 --> 22:01.100]  and if they are happy with it.
[22:01.540 --> 22:05.940]  Once all the changes are made, make sure it still meets, you know, the object schema validation.
[22:06.160 --> 22:08.780]  So, make sure it's still a valid resource.
[22:09.000 --> 22:13.960]  And then finally, passes it to the validating admission controls to make sure there are...
[22:13.960 --> 22:21.080]  or run all the final checks and make sure it's still a valid piece of thing to save before saving it into etcd,
[22:21.480 --> 22:24.820]  at which point the controller managers and schedulers can start, you know,
[22:24.820 --> 22:27.820]  using that data to make the changes within the cluster as needed.
[22:29.620 --> 22:32.200]  Now, it is possible to create your own admission controls,
[22:32.200 --> 22:36.640]  but there are a whole bunch of inbuilt admission controllers that are there by default,
[22:36.640 --> 22:38.720]  which can be enabled at the API server.
[22:38.960 --> 22:41.260]  Now, there are a whole bunch of security admission controls as well,
[22:41.260 --> 22:43.640]  but most of these tend not to be enabled by default.
[22:43.640 --> 22:48.780]  So, it's usually manually need to be enabled in the API server if it's a self-managed cluster.
[22:49.300 --> 22:52.060]  So, for example, pod security policy, which we're going to be talking about later,
[22:52.060 --> 22:53.440]  or always pull images.
[22:54.460 --> 23:00.660]  Always pull images is essentially a mutating controller where as pod specifications come through,
[23:00.660 --> 23:06.440]  it makes sure that the image policy is set to always as opposed to the default of if not present.
[23:06.940 --> 23:12.140]  This is especially important in like multi-tenant clusters where if it's set to if not present
[23:12.140 --> 23:16.880]  and you have two separate tenants where one has permissions over an image and the other one doesn't,
[23:16.880 --> 23:22.520]  the one that doesn't have permission over that image could use that image without having the relevant permissions.
[23:22.520 --> 23:25.020]  If that image has been pre-pulled onto the node.
[23:25.200 --> 23:30.680]  However, if the image pull policy is set to always, it would only allow the deployment of that pod
[23:30.680 --> 23:35.640]  if that tenant could pull that image in the first place, which is kind of a way to prevent them using images
[23:35.640 --> 23:36.880]  they're not permitted to.
[23:37.700 --> 23:40.240]  And there's a whole bunch of admission controls like this.
[23:40.320 --> 23:44.700]  In fact, having mutating and validating admission controllers, custom ones,
[23:44.700 --> 23:48.720]  require the inbuilt admission controls for the same thing being enabled.
[23:48.720 --> 23:58.820]  So there was an assessment we did a while back where we actually ran into OPA as a mutating admission controller
[23:58.820 --> 24:01.140]  and within the network, it was the flat network.
[24:01.140 --> 24:04.320]  So we could talk to the web front end for it.
[24:04.320 --> 24:09.920]  And we found it in the cluster because the pod we had compromised had over-provisioned groups.
[24:09.940 --> 24:15.120]  So we could start reading resources within the cluster because they had read-only access.
[24:15.120 --> 24:20.820]  We could have also guessed the service name because it was just OPA.OPA, but that's boring.
[24:21.760 --> 24:25.880]  Now, because it was a mutating admission controller, if we could get policies into that,
[24:25.880 --> 24:28.780]  we could change the specification of pods in the cluster.
[24:28.920 --> 24:32.400]  And luckily for us, there was no authentication for OPA configured.
[24:33.140 --> 24:39.480]  So a quick policy pushed up to OPA and waiting for pods meant that we could suddenly start backdooring pods
[24:39.480 --> 24:43.780]  in other areas of the cluster and start moving around the cluster in that manner.
[24:46.000 --> 24:50.240]  Now, when talking about Kubernetes security, one thing that you kind of can't forget is the fact that
[24:50.240 --> 24:51.980]  they are using containers under the hood.
[24:52.460 --> 24:55.480]  And therefore, things like container breakouts do apply here.
[24:55.480 --> 25:01.020]  Now, I won't be going into too much detail about, you know, all the ways that are possible for container breakouts,
[25:01.020 --> 25:04.580]  but a couple of common ones are like, you know, leveraging host volume mounts,
[25:04.580 --> 25:08.160]  where, you know, folders from the host file system have been mounted into the container
[25:08.160 --> 25:13.300]  and you can kind of use those to add like a SSH key if you have someone's home folder
[25:13.300 --> 25:19.240]  or if file run is mounted talking to the Docker socket if you're using Docker or all sorts of things.
[25:19.680 --> 25:24.540]  There's also, you know, kernel exploits like DirtyCow if, you know, it's an out-of-date kernel version
[25:24.540 --> 25:27.300]  or also overly permissive kernel capability.
[25:27.300 --> 25:31.480]  So if, you know, the privilege flag has been used, you have the ability to load kernel modules.
[25:31.480 --> 25:34.380]  So you can kind of load a kernel module, get onto the host that way.
[25:34.380 --> 25:42.480]  And once you have that container breakout set up, because typically one node deploys model pods
[25:42.480 --> 25:46.080]  on that same node, as soon as you've broken out one pod, you suddenly have access
[25:46.080 --> 25:48.980]  to other pods shared on that node.
[25:49.260 --> 25:53.120]  Now to help mitigate container breakouts, there's a pod security policy once you have
[25:53.120 --> 25:56.060]  the admission controller enabled for such thing.
[25:56.680 --> 26:00.760]  Excuse me. And that kind of helps mitigate a lot of container breakout attacks
[26:01.740 --> 26:05.880]  by limiting what you can actually do with, you know, volumes, your user IDs that you're allowed
[26:05.880 --> 26:08.620]  to run with, the kernel capabilities and so on and so forth.
[26:09.380 --> 26:13.680]  Now, the way these are used is you would create a PSP or pod security policy.
[26:13.860 --> 26:16.340]  And then through RBAC, you would assign that to your users.
[26:16.340 --> 26:19.600]  And those users, when deploying the pod, would have to follow it, you know,
[26:19.600 --> 26:24.240]  one of those PSPs if they... yeah.
[26:25.040 --> 26:28.400]  Now, one of the things I typically recommend with this is, you know, using the system
[26:29.260 --> 26:33.560]  authenticated group where you create the most restrictive PSP, you know, you can,
[26:33.560 --> 26:36.400]  and then assign that to system authenticator at that point, you know, that should apply
[26:36.400 --> 26:41.880]  cluster-wide. And then where more permissive pod security policies are required,
[26:41.880 --> 26:45.120]  those can be, you know, assigned when and where they're needed.
[26:48.280 --> 26:53.020]  On the same level, if you can't, you know, if you can't prevent container breakout
[26:53.020 --> 26:56.320]  for whatever reason, and an attacker somehow does still manages it,
[26:56.320 --> 26:59.880]  you can at least start controlling where pods are within a cluster
[27:00.380 --> 27:03.620]  to kind of mitigate the impact of that compromise.
[27:03.620 --> 27:06.940]  So let's say in the cluster, there are pods that have, you know,
[27:06.940 --> 27:09.900]  just a standard basic brochureware applications.
[27:09.920 --> 27:13.620]  There are ones that handle massive business critical data,
[27:13.620 --> 27:16.060]  and all sorts of new management pods.
[27:16.400 --> 27:20.980]  You can start categorizing nodes to, you know, whole different types of pods.
[27:20.980 --> 27:24.700]  So you can have a whole bunch of nodes allocated to the brochureware applications,
[27:24.700 --> 27:28.120]  all the stuff you don't really care about, a whole bunch of nodes allocated to,
[27:28.120 --> 27:31.780]  you know, the business sensitive pods that handle very critical data that
[27:31.780 --> 27:36.340]  you really don't want compromised, another set of nodes from the management plane,
[27:36.340 --> 27:37.560]  and so on and so forth.
[27:37.600 --> 27:40.700]  And these can kind of be enforced through tents and tolerations,
[27:40.700 --> 27:44.480]  where you can apply these to the nodes and pods, and then a pod is only limited
[27:44.480 --> 27:48.940]  where it can be deployed to based on these restrictions.
[27:49.500 --> 27:52.220]  And then these can also be applied to the namespace layer,
[27:52.220 --> 27:56.760]  where you can say that all pods within this namespace need to follow these tolerations.
[27:56.860 --> 28:00.600]  And that way you can kind of, from a cluster administrative perspective,
[28:00.600 --> 28:05.040]  manage what the impact could be of a container breakout should it were to happen.
